Skip to content

feat(tools): add backward graph generation and validation tools#711

Open
Dayuxiaoshui wants to merge 5 commits into
PaddlePaddle:developfrom
Dayuxiaoshui:develop
Open

feat(tools): add backward graph generation and validation tools#711
Dayuxiaoshui wants to merge 5 commits into
PaddlePaddle:developfrom
Dayuxiaoshui:develop

Conversation

@Dayuxiaoshui
Copy link
Copy Markdown
Contributor

@Dayuxiaoshui Dayuxiaoshui commented May 15, 2026

PR Overview

This PR fixes 4 critical issues in the backward_graph_extractor.py pipeline for generating backward computational graphs, adds a kernel_dedup.py tool for Triton kernel-level deduplication, and improves test_compiler.py compatibility with list-typed outputs from backward graphs.


Types of Samples Fixed

Issue 1: BatchNorm subgraphs crashing during backward graph generation

Affected samples: ultralytics/yolov6l_start2_end8_0, ultralytics/yolov9e-seg, and others containing BatchNorm layers.

Root cause: The original implementation uses module.train() mode, causing BatchNorm's running_mean/running_var to have requires_grad=True when passed to aot_module_simplified. However, _native_batch_norm_legit_no_training does not support gradient computation w.r.t. running_mean:

RuntimeError: not differentiable with respect to argument 'running_mean'

Fix: Switch to module.eval() mode and parse weight_meta.py original_name to identify running_mean/running_var/num_batches_tracked, excluding them from requires_grad.

Issue 2: Input tensors corrupted by inplace operations

Affected samples: All backward graph generation.

Root cause: The original code reuses raw input tensors directly. Inplace ops (e.g., add_) mutate leaf tensors, causing gradient computation anomalies.

Fix: Apply detach().clone() to all input tensors.

Issue 3: Backward graph list outputs unsupported by test_compiler

Affected samples: Backward graphs returning [tensor], e.g., mmpose/LiteHRNet-18_start2_end6_0.

Root cause: Backward graphs output [tensor] lists. test_compiler's _align_output_device and torch.equal comparison functions only handle Tensor, crashing on list types:

TypeError: equal(): argument 'input' (position 1) must be Tensor, not list

Fix: Add recursive handling of nested list/tuple structures in test_compiler's output alignment and comparison functions.

Issue 4: Missing graph_hash.txt prevents kernel extraction

Affected: All backward graph samples — 0 kernels extracted after successful compilation.

Root cause: GraphExtractor does not generate graph_hash.txt when saving models. triton_kernel_extractor requires original_graph/graph_hash.txt to trigger extraction.

Fix: GraphExtractor now computes SHA256 of model.py and writes graph_hash.txt automatically.


Success Rate: Before vs. After

Before this PR (original backward_graph_extractor):

  • Subgraphs with BatchNorm: 0% (all crash with running_mean gradient error)
  • Typical subgraphs without BN: ~60-70% (affected by inplace inputs and train mode)

After this PR:

Subgraph Type Samples Success Failed Success Rate
typical float32 30 30 0 100%
typical float32 50 50 0 100%
fusible float32 30 28 2 93.3%
fusible float32 50 42 8 84%
typical backward compile + extract 20 15 5 75%

87.5% of fusible failures are due to output tensors without requires_grad (e.g., int64 indices/masks), a structural characteristic of fusible decomposition, not a code bug.


test_compiler Verification

Before: test_compiler crashes on backward graphs (missing weight_meta + list output). After:

Subgraph Type Generated test_compiler Passed test_compiler Failed
typical float32 30 30 (100%) 0
fusible float32 28 28 (100%) 0

Zero false positives: No "Environment fluctuation detected" events.


Triton Kernel Dedup Tool

New tools/triton_kernel_extractor/kernel_dedup.py (invoked via dedup subcommand). Performs kernel-level dedup by hashing Triton source code content, complementary to graph-level graph_hash.txt dedup:

Graph Type Samples Total Kernels Unique Dedup Rate
Forward typical 24 24 21 12.5%
Backward typical 15 15 15 0%

Changed Files

File Change
graph_net/torch/sample_pass/backward_graph_extractor.py module.eval() + BN param filtering + detach().clone()
graph_net/torch/extractor.py Auto-generate graph_hash.txt on save
graph_net_bench/torch/test_compiler.py Support nested list/tuple outputs
tools/triton_kernel_extractor/kernel_dedup.py New: Triton kernel source content dedup tool
tools/triton_kernel_extractor/__main__.py New dedup subcommand

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 15, 2026

Thanks for your contribution!

This commit introduces backward graph generation pipeline integrated with
GraphNet's test_compiler framework.

Changes:
- graph_net/torch/extractor.py: add try/except for capture_sparse_compute
  to support PyTorch versions where the config does not exist.
- graph_net/torch/sample_pass/backward_graph_extractor.py:
  - switch module from train() to eval() to avoid dropout/BN side effects
  - clone forward inputs with detach().clone() to avoid inplace modification
  - add _is_pure_shape_graph() to skip subgraphs with only shape ops
- tools/backward_graph_test.py:
  - batch backward FX Graph generation via aot_autograd
  - integrated test_compiler validation with auto-generated weight_meta.py
  - default GRAPH_NET_FLUCTUATION_DETECT_THRESHOLD=0.5 and trials=10
- tools/backward_kernel_dedup.py:
  - Triton kernel dedup analysis for backward graphs
Copy link
Copy Markdown
Collaborator

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个PR修复了哪些类型样本的反向图生成问题,需要举例在PR描述里面说明。应用PR后反向图生成成功率变化的数据,也需要写到PR描述里面。

self.model_path, use_dummy_inputs=False, device=self.device
)
module.train()
module.eval()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eval模式下不会生成反向图吧?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model.eval() 不会禁用梯度计算,只有 torch.no_grad() / torch.inference_mode() 才会。eval 仅改变特定层的前向行为(dropout → identity,BatchNorm → 用 running stats 而非 batch stats),反向传播完全正常。而且使用 eval 模式反而更好

self.model_path, use_dummy_inputs=False, device=self.device
)
module.train()
module.eval()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eval模式下不会有反向图吧?

module.train()
module.eval()

if self._is_pure_shape_graph(module):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种列举不完的,不建议加这种判断

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同意,已删除。纯形状子图(只有 view/reshape/transpose 等)在执行 backward 捕获时会自然地因为输出 tensor 无可求导而返回空,不需要额外预处理跳过。

Comment thread tools/backward_graph_test.py Outdated
@@ -0,0 +1,538 @@
#!/usr/bin/env python3
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接用https://github.com/PaddlePaddle/GraphNet/blob/develop/graph_net/test/backward_graph_extractor.sh 这个脚本就可以测试,不需要再加单测。

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除 tools/backward_graph_test.py。

Comment thread tools/backward_kernel_dedup.py Outdated


def main():
parser = argparse.ArgumentParser(description="Backward kernel dedup analysis.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个代码是什么反向Kernel去重?按照model.py的graph_hash.txt去重吗?这也不需要额外写代码,使用已有代码即可实现。

…one tools

- Remove _is_pure_shape_graph() from backward_graph_extractor.py per
  reviewer feedback (incomplete op whitelist, not maintainable)
- Remove tools/backward_graph_test.py (use existing shell script
  graph_net/test/backward_graph_extractor.sh for batch processing)
- Remove tools/backward_kernel_dedup.py (use existing graph_hash.txt
  based dedup in graph_net/tools/deduplicated.py)
…tent

Add `kernel_dedup.py` and wire it as a `dedup` subcommand under
`tools.triton_kernel_extractor`.  This performs kernel-level dedup by
hashing normalized Triton kernel source (triton_poi_fused_xxx.py),
which is complementary to the existing graph-level dedup via graph_hash.txt.

Signed-off-by: Dayuxiaoshui <792179245@qq.com>
Signed-off-by: Dayuxiaoshui <792179245@qq.com>
- test_compiler: handle list/tuple outputs from backward graphs recursively
  in _align_output_device and output wrapping logic
- extractor: generate graph_hash.txt from model.py content when saving

Signed-off-by: Dayuxiaoshui <792179245@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants